7 research outputs found

    Predicting Audio Advertisement Quality

    Full text link
    Online audio advertising is a particular form of advertising used abundantly in online music streaming services. In these platforms, which tend to host tens of thousands of unique audio advertisements (ads), providing high quality ads ensures a better user experience and results in longer user engagement. Therefore, the automatic assessment of these ads is an important step toward audio ads ranking and better audio ads creation. In this paper we propose one way to measure the quality of the audio ads using a proxy metric called Long Click Rate (LCR), which is defined by the amount of time a user engages with the follow-up display ad (that is shown while the audio ad is playing) divided by the impressions. We later focus on predicting the audio ad quality using only acoustic features such as harmony, rhythm, and timbre of the audio, extracted from the raw waveform. We discuss how the characteristics of the sound can be connected to concepts such as the clarity of the audio ad message, its trustworthiness, etc. Finally, we propose a new deep learning model for audio ad quality prediction, which outperforms the other discussed models trained on hand-crafted features. To the best of our knowledge, this is the first large-scale audio ad quality prediction study.Comment: WSDM '18 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 9 page

    A Data-driven Exploration of Rhythmic Attributes and Style in Music

    No full text
    Humans identify with three basic components of music: melody, harmony, and rhythm, in order to describe and differentiate songs. With these simple components, one can recognize higher level concepts such as the style and other expressive elements of a piece of music. In this thesis, I explore rhythmic components and their relationships to each other, to genre, and other geo-cultural factors (i.e., language) through data driven approaches using audio signals. Working in conjunction with Pandora, I employ a corpus of over 1 million expertly-labeled audio examples across many rhythmic styles and genres from their flagship Music Genome Project. Each song is labeled with more than 500 attributes of rhythm, instrumentation, timbre, and genre. In order to model the rhythmically related information from audio signals, I implement a set of novel and compact rhythm-specific acoustic features. They represent beat-level and meter-level information as well as elements of rhythmic variation and pulse stability. First, the acoustic features are used to predict the presence of human-annotated attributes of the meter and rhythmic feel (i.e., swing). Previous work has studied the general recognition of rhythmic styles in music audio signals, but few efforts have focused on the deconstruction and quantification of the foundational components of global rhythmic structures. Second, I focus on rhythm and its relationship to genre. Genre provides one of the most convenient categorizations of music, but it is often regarded as a poorly defined or largely subjective musical construct. I provide evidence that musical genres can to a large extent be objectively modeled via a combination of musical attributes, with rhythm playing a significant role. Finally, through a set of unsupervised machine learning experiments that employ both the human-labeled attributes and acoustic features, a set of low-dimensional, perceptually-motivated rhythm spaces is designed. These spaces provide grounded and visual insight into the relationships between rhythmic attributes and rhythmic styles. Most previous work strives to automatically predict a specific phenomena (i.e., genre) without a contextual understanding of why a label is applied. This work is motivated by largely the same idea, however, I aim to not only predict the phenomena but also understand the components used to construct it. This opens up the door to a more grounded and intuitive understanding of these components and how they interact to create the different styles of music we enjoy.Ph.D., Electrical Engineering -- Drexel University, 201

    End-to-end learning for music audio tagging at scale

    No full text
    Comunicació presentada a: 19th International Society for Music Information Retrieval Conference (ISMIR 2018), celebrat del 23 al 27 de setembre de 2018 a París, França.The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study, 1.2M tracks annotated with musical labels are available to train our end-to-end models. This large amount of data allows us to unrestrictedly explore two different design paradigms for music auto-tagging: assumption-free models - using waveforms as input with very small convolutional filters; and models that rely on domain knowledge - log-mel spectrograms with a convolutional neural network designed to learn timbral and temporal features. Our work focuses on studying how these two types of deep architectures perform when datasets of variable size are available for training: the MagnaTagATune (25k songs), the Million Song Dataset (240k songs), and a private dataset of 1.2M songs. Our experiments suggest that music domain assumptions are relevant when not enough training data are available, thus showing how waveform-based models outperform spectrogram-based ones in large-scale data scenarios.This work was partially supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502) – and we are grateful for the GPUs donated by NVidia

    End­-to-­end learning for music audio tagging at scale

    No full text
    Comunicació presentada a: Workshop Machine Learning for Audio Signal Processing at NIPS 2017 (ML4Audio@NIPS17) celebrat del 4 al 9 de desembre de 2017 a Long Beach, California.The lack of data tends to limit the outcomes of deep learning research – specially, when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study we make use of musical labels annotated for 1.2 million tracks. This large amount of data allows us to unrestrictedly explore different front-end paradigms: from assumption-free models – using waveforms as input with very small convolutional filters; to models that rely on domain knowledge – log-mel spectrograms with a convolutional neural network designed to learn temporal and timbral features. Results suggest that while spectrogram-based models surpass their waveform-based counterparts, the difference in performance shrinks as more data are employed.This work is partially supported by the Maria de Maeztu Programme (MDM-2015-0502)

    End-to-end learning for music audio tagging at scale

    No full text
    Comunicació presentada a: 19th International Society for Music Information Retrieval Conference (ISMIR 2018), celebrat del 23 al 27 de setembre de 2018 a París, França.The lack of data tends to limit the outcomes of deep learning research, particularly when dealing with end-to-end learning stacks processing raw data such as waveforms. In this study, 1.2M tracks annotated with musical labels are available to train our end-to-end models. This large amount of data allows us to unrestrictedly explore two different design paradigms for music auto-tagging: assumption-free models - using waveforms as input with very small convolutional filters; and models that rely on domain knowledge - log-mel spectrograms with a convolutional neural network designed to learn timbral and temporal features. Our work focuses on studying how these two types of deep architectures perform when datasets of variable size are available for training: the MagnaTagATune (25k songs), the Million Song Dataset (240k songs), and a private dataset of 1.2M songs. Our experiments suggest that music domain assumptions are relevant when not enough training data are available, thus showing how waveform-based models outperform spectrogram-based ones in large-scale data scenarios.This work was partially supported by the Maria de Maeztu Units of Excellence Programme (MDM-2015-0502) – and we are grateful for the GPUs donated by NVidia
    corecore